5 research outputs found
Compiling Tree Transforms to Operate on Packed Representations
When written idiomatically in most programming languages, programs
that traverse and construct trees operate over pointer-based data
structures, using one heap object per-leaf and per-node. This
representation is efficient for random access and shape-changing
modifications, but for traversals, such as compiler passes, that
process most or all of a tree in bulk, it can be inefficient. In this
work we instead compile tree traversals to operate on
pointer-free pre-order serializations of trees. On modern
architectures such programs often run significantly faster than
their pointer-based counterparts, and additionally are directly suited
to storage and transmission without requiring marshaling.
We present a prototype compiler, Gibbon, that compiles a
small first-order, purely functional language sufficient for tree
traversals. The compiler transforms this language into intermediate
representation with explicit pointers into input and output buffers
for packed data. The key compiler technologies include an effect
system for capturing traversal behavior, combined with an algorithm to
insert destination cursors. We evaluate our compiler on tree
transformations over a real-world dataset of source-code syntax trees.
For traversals touching the whole tree, such as maps and folds, packed
data allows speedups of over 2x compared to a highly-optimized
pointer-based baseline
LoCal: a language for programs operating on serialized data
In a typical data-processing program, the representation of data in memory is distinct from its representation in a serialized form on disk. The former has pointers and arbitrary, sparse layout, facilitating easy manipulation by a program, while the latter is packed contiguously, facilitating easy I/O. We propose a language, LoCal, to unify in-memory and serialized formats. LoCal extends a region calculus into a location calculus, employing a type system that tracks the byte-addressed layout of all heap values. We formalize LoCal and prove type safety, and show how LoCal programs can be inferred from unannotated source terms.
We transform the existing Gibbon compiler to use LoCal as an intermediate language, with the goal of achieving a balance between code speed and data compactness by introducing just enough indirection into heap layouts, preserving the asymptotic complexity of traditional representations, but working with mostly or completely serialized data. We show that our approach yields significant performance improvement over prior approaches to operating on packed data, without abandoning idiomatic programming with recursive functions
Techniques for Automatic Fusion of General Tree Traversals
Trees are common data structures that are used in many programs and applications. In its simplest form, a binary tree can be used to store numbers in sorted manners. Kd-trees, render trees and abstract syntax trees are more sophisticated examples of tree structures. Furthermore, in functional programming algebraic data types are essentially tree structures as well. In several tree-based applications, a tree is constructed, and several traversals traverse the tree to perform different computations. Tree fusion is a transformation that targets combining and fusing different traversals that traverse the same tree and perform them together (ideally in one traversal). Traversal fusion has several performance benefits such as reducing the traversing overhead and the memory accesses, enhancing locality, and eliminating intermediate structures. Previous work has been done on fusion and was mostly successful either in specific domains or limited scopes. This work introduces novel techniques for performing fusion in both imperative and functional programming settings with a focus on generality. The new techniques target general traversals; minimizing the burden on programmers and increasing the coverage of the transformation. Furthermore, it exploits fusion opportunities that previous approaches do not, achieving significant speedups for a wider range of programs